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ABSTRACT 


To  adequately  describe  the  general  circulation  of  the  ocean  requires  a  fine  spatial  resolution 
and  extremely  small  time  steps  for  model  integration.  This  combination  leads  to  large  in-core 
computer  memories  and  computer  processing  time  for  the  model  integration.  In  addition,  the 
numerical  models  have  to  be  coupled  with  data  assimilation  schemes  to  derive  accurate  initial 
conditions  computing  model  forecasts.  To  make  the  nowcasting  and  forecasting  practical,  it  is 
necessary  that  the  data  assimilation  schemes  be  computationally  efficient  and  parsimonious  in 
computer  in-core  memories. 

The  work  reported  in  this  technical  report  was  initiated  at  the  Institute  for  Naval  Oceanog¬ 
raphy  ( INO)  to  develop  data  assimilation  procedures  that  are  efficient  and  are  widely  applicable. 
Our  initial  data  assimilation  work  was  done  with  a  scheme  which  was  developed  by  Derber  and 
Rosati  (1989)  for  their  global  data  assimilation  application.  The  scheme  is  quite  efficient;  but 
when  it  was  combined  with  a  primitive  equation,  numerical  model  of  the  Gulf  of  Mexico,  the 
nowcasting  step  took  too  much  computer  time  for  it  to  be  useful  in  real-time  applications.  So, 
after  an  initial  implementation  and  test  of  the  scheme,  our  obvious  thrust  was  to  enhance  and 
make  it  more  efficient. 

Most  of  the  computation  resources  while  using  the  Derber-Rosati  scheme  are  required  in 
approximating  a  matrix  product  S,„g,  where  Sm  is  the  covariance  matrix  of  the  model  output 
error  having  a  Gaussian  spatial  covariance  structure  that  stays  fixed,  and  g  is  a  varying  vector. 
They  approximate  this  product  by  repeated  applications  of  a  Laplacian  operator. 

A  new  algorithm  is  developed  that,  while  maintaining  the  accuracy,  approximates  S,„g 
far  more  efficiently  than  the  Laplacian  algorithm.  It  also  admits  a  wider  class  of  covariance 
structures  with  equal  ease  and  efficiency.  The  new  algorithm  approximates  Snig  by  applying  a 
product  of  two  operators  that  are  polynomials  in  simple  averaging  operators.  The  algorithm  is 
efficient  and  general  with  the  only  restriction  that  Sm  be  based  on  a  covariance  function  that 
is  a  product  of  factors,  which  are  even  functions  in  a  single  dimension. 

The  purpose  of  this  technical  report  is  to  provide  details  of  the  Derber-Rosati  algorithm, 
which  has  since  found  a  much  wider  acceptance  in  the  community.  It  has  provided  insight  into 
the  approximation  of  S^g  that  led  to  the  development  of  the  new  algorithm.  Details  of  the 
new  algorithms  are  also  given;  and  using  computer  simulations,  the  approximation  efficiencies 
of  the  two  algorithms  are  compared  with  the  actual  matrix  multiplication,  Smg- 

1.  INTRODUCTION 

Nowcasting  and  forecasting  are  important  activities  in  meteorology.  Nowcasting  combines 
the  output  of  a  numerical  forecast  model  with  observations  to  provide  initial  conditions,  that 
are  in  dynamical  balance,  used  by  the  model  to  calculate  the  forecast.  Now  that  the  numerical 
models  of  the  general  circulation  of  the  ocean  have  matured,  these  two  activities  are  playing  an 
important  role  in  oceanography  to  provide  a  realistic  description  of  the  various  oceanic  regions. 
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To  adequately  describe  the  general  circulation  of  the  ocean  requires  a  fine  spatial  resolution 
and  extremely  small  time  steps  in  numerical  model  integration.  This  combination  leads  to 
large  in-core  computer  memories  and  computer  processing  time  for  the  model  integration 
In  addition,  the  numerical  models  have  to  be  combined  with  data  assimilation  schemes  to 
derive  accurate  initial  conditions  which  are  used  by  the  model  to  compute  forecasts.  To  make 
the  nowcasting  and  forecasting  practical,  it  is  necessary  that  numerical  modeling  and  data 
assimilation  schemes  be  computationally  efficient  and  not  require  inordinately  large  computer 
in-core  memories. 

Data  assimilation  procedures  that  are  efficient  and  are  widely  applicable,  as  reported  in 
this  technical  report,  were  developed  at  the  Institute  for  Naval  Oceanography  (INO).  Initial 
data  assimilation  work  was  done  using  an  algorithm  reported  by  Derber  and  Rosati  (1989), 
(hereafter  referred  to  as  DR)  which  was  developed  for  global  data  assimilation  application. 
This  scheme  is  an  optimum  interpolation  scheme  but  unique  to  the  extent  that  it  performs  a 
global  minimization  to  update  the  entire  model  domain  simultaneously,  as  opposed  to  other 
optimum  interpolation  schemes  that  update  at  a  single  point  at  a  time.  The  scheme  is  quite 
efficient;  but  when  combined  with  a  primitive  equation,  numerical  model  of  the  Gulf  of  Mexico, 
the  nowcasting  step  took  too  much  computer  time  for  it  to  be  useful  in  real-time  applications. 
After  an  initial  implementation  and  test  of  the  of  this  scheme,  our  obvious  thrust  was  to 
enhance  and  make  it  more  efficient. 

The  DR  algorithm  combines  the  model  output  with  the  observations  using  a  linear  least 
squares  method.  Because  of  the  large  dimensions  involved,  the  minimization  in  the  least 
squares  procedure  is  performed  by  the  method  of  steepest  descent,  as  it  avoids  inversion 
of  the  covariance  matrix,  E,„,  of  the  model  output.  Instead,  the  procedure  requires  several 
multiplications  of  the  matrix  Sm  with  a  vector,  g.  Because  the  dimensions  of  S,n  are  large,  and 
in-core  storage  parsimony  is  a  necessity,  it  is  not  stored  in-core.  Since,  repeated  computation 
of  Sni  would  require  large  computational  resources,  DR  developed  an  efficient  algorithm  to 
evaluate  Smg-  This  algorithm  avoided  evaluation  of  Em  and  approximated  the  multiplication 
Smg  by  repeated  applications  of  a  Laplacian  operator. 

The  purpose  of  this  technical  report  is  to  provide  details  of  the  DR  algorithm,  which 
has  since  found  a  much  wider  acceptance  in  the  community.  This  algorithm  provided  insight 
into  the  mechanism  of  approximating  Emg  via  spectral  domain  and  led  to  the  development 
of  a  new  algorithm.  The  new  algorithm  (named  as  the  Product-Polynomial  Operator  (PPO) 
algorithm),  while  maintaining  the  accuracy,  approximates  Emg  ^ar  more  efficiently  than  the 
Laplacian  algorithm;  the  approximation  is  affected  by  applying  a  product  of  two  operators  that 
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are  polynomials  in  simple  averaging  operators.  Using  computer  simulations,  approximation 
efficiencies  of  the  two  algorithms  are  compared  with  the  actual  matrix  multiplication,  E,„g, 

In  addition  to  the  basic  motivation  to  develop  a  more  efficient  data  assimilation  algorithm 
with  a  Gaussian  covariance  function,  it  has  also  been  of  interest  to  develop  an  algorithm  which 
accepts  other  spatial  covariance  functions  of  the  model  output  errors.  The  PPO  algorithm  is 
shown  to  admit,  with  equal  ease  and  efficiency,  a  wider  class  of  covariance  functions  that  can 
be  expressed  as  a  product  of  factors  that  are  even  functions  in  a  single  dimension.  However, 
the  practical  utility  of  this  result  is  yet  to  be  explored;  presently,  not  many  two-dimensional 
covariance  structures,  other  than  Gaussian,  appear  to  satisfy  the  factorability  condition. 

Although,  the  application  of  this  work  is  intended  for  atmospheric  and  oceanic  applications, 
its  contents  are  more  mathematically  oriented.  In  Section  2,  we  first  develop  the  statistical 
framework  needed  to  arrive  at  a  common  formulation  of  the  optimum  interpolation  problem, 
leading  to  the  least  squares  solution  by  the  steepest  descent,  iterative  minimization  process. 
In  particular,  it  indicates  the  step  where  the  matrix  multiplication  Smg  has  to  be  evaluated 
repeatedly.  Next,  in  Section  3,  we  describe  the  Laplacian  smoothing  algorithm  used  in  DR 
to  approximate  this  matrix  multiplication.  This  is  followed,  in  Section  4,  by  the  development 
and  description  of  our  newly-developed  PPO  algorithm  for  the  same  approximation,  with  a 
discussion  on  the  determination  of  the  operator-polynomial  degrees,  approximation  accuracy, 
and  implementation  procedures.  Finally,  in  Section  5,  we  present  the  results  of  numerical 
simulations  that  compare  the  approximation  accuracy  and  the  computation  efficiency  of  the 
two  algorithms. 

2.  STATISTICAL  FORMULATION  OF  THE  DATA  ASSIMILATION  PROBLEM 

The  purpose  of  data  assimilation  is  to  estimate  the  true  state  of  the  ocean,  ©,  by  com¬ 
bining  the  model  output,  T,„,  and  observations  T„.  The  combined  estimate  (after  dynamic 
initialization,  in  some  cases;  see  Daley,  1981)  is  then  used  as  the  initial  condition  for  the  model 
integration.  The  vectors  0  and  T„,  are  iY-vectors  defined  on  the  model  grid,  the  obser¬ 
vation  vector,  T„  is  an  ^/-vector  defined  on  the  observation  grid  Qo-  Usually,  M  <<  N ,  and 
Tn  alone  cannot  provide  an  adequate  representation  of  ©.  Thus,  assimilation  is  performed, 
resulting  in  an  estimate  on  (7,„  to  provide  initial  conditions  for  the  numerical  integration  of  the 
model  equations. 

We  assume  there  exists  a  mapping  such  that  D(^,„)  =  Qo-  Then  ©„,  the  true  state  of 
the  ocean  at  the  observation  grid,  can  be  written  as  ©„  =  D(©).  Often  D  is  assumed  to  be 
a  linear  mapping  so  that. 

©„  =  D©.  (2.1) 
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The  motivation  to  perform  such  a  data  assimilation  is  the  assumption  that  both  the  model 
values  and  the  observational  data  are  unbiased  estimators  of  the  true  state  of  the  ocean,  and 
that  the  optimal  combination  of  the  two  will  lead  to  an  estimator  of  0,  which  has  a  smaller 
error  variance  than  either  of  the  two.  Under  the  unbiasedness  assumption,  we  can  write  the 
following  linear  model: 


where  e,„  and  e„  are  vectors  of  zero  mean  random  errors  in  the  model  output  and  observations, 
with  E,„  and  as  their  respective  covariance  matrices.  It  is  reasonable  to  assume  that  the 
errors  in  and  T,„  are  statistically  independent.  Then,  the  least  squares  solution  to  the  true 
state  0  is  obtained  by  minimizing  the  quadratic  functional: 

0=:(T,„  -0yE-/(T,„  -0)  +  (T,-D©yET'(T„-D0)  (2.3) 

where  the  prime  indicates  matrix  transposition.  Lorenc  (1986)  arrived  at  the  minimization  of 
this  quadratic  functional  from  a  Bayesian  approach.  It  is  customary  to  write  Q  in  terms  of 
corrections  to  the  model  values; 

Q  =  t;s-'Tc  +  (DTe  -  TcoyS,-'(DTc  -  Tc).  (2.4) 

where  =  T„,  -  0  and  T^o  =  DT,„  -  T^. 

Minimization  of  Q  is  the  most  common  formulation  of  data  assimilation.  Ordinarily,  this 
minimization  should  be  quite  simple  but  for  the  fact  that  the  dimensions  of  the  model  grid, 
and  hence  that  of  the  covariance  matrix,  ate  extremely  large;  thus,  the  routinely-used 
procedures  are  rendered  inadequate,  and  efficient  minimization  algorithms  must  be  devised. 

The  data  assimilation  algorithm  is  determined  by  specifying  the  covariance  matrices 
and  and  by  the  method  of  minimizing  Q.  With  a  given  Sm.  DR  gave  an  efficient, 
preconditioned  conjugate  gradient  algorithm  for  minimizing  Q,  which  avoids  computing  E 
(also,  see  Navon  and  Legler,  1987).  It  iteratively  reduces  the  gradient  vector,  g,  to  zero;  i.e., 

E-'T,  +  D'i:;>(D(TJ-T„)^0.  (2.5) 

Preconditioning  is  supplied  by  the  matrix  S„,.  The  iteration  steps  of  the  conjugate  gradient 
method  are  as  follows.  Let  T,  be  the  iterative  solution  to  Tr  at  iteration  /.  We  start  with 
initialization: 

Ti  =  0 

g,  =  -DTe-‘T„ 
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(2.6n) 

(2.6/,) 


Ill  —  Smgi 

tin  =  ©0  =  0. 


(2.6,) 


Then,  the  iterations  are  carried  out  via  equations,  (2.7a)-(2.7h),  given  below: 


d,i  —  h»  + 

(2.7„) 

^11  — l®rj  — 1 

(2.71,) 

f„  =  r„-bD^ST'Dd„ 

(2.7c) 

(d„.f„) 

g/i+i  ~  g»i  T  0,1  f„ 

(2.7,1) 

(2.7,) 

Tii-t-i  =  T„  -b  «„d„ 

(2.7/) 

~  5jn»gn  +  l 

(2.7;/) 

■'n  +  l  —  ,  1  , 

(27/.) 

where  n  is  the  iteration  index.  These  steps  are  repeated  until  convergence  is  achieved.  Because 
of  statistically  independent  observation  errors,  their  covariance  nnatrix,  So,  and  its  inverse, 
Sj',  are  diagonal.  Thus,  expressions  in  eqs.  (2.6b)  and  (2.7c)  involving  these  matrices 
are  easily  computed.  Also,  the  bilinear  interpolation  matrix  D  is  ingeniously  manipulated. 
Thus,  the  only  computational  concerns  arise  from  eqs.  (2.6)  and  (2.7),  where  computation  of 
h„+i  =  S,„g„+i  is  carried  out. 

For  most  implemented  covariance  structures,  this  operation  is  quite  expensive,  moreso  for 
repeated  applications  of  the  operation.  In  fact,  there  is  a  two-fold  problem  at  this  step:  One 
is  the  multiplication  operation  that  we  just  mentioned;  the  second  is  a  practical  problem  of 
actual  storage  of  the  large-dimensioned  computed  covariance  matrix.  Although  the  computed 
matrix  can  be  stored  on  a  file  and  read  when  needed,  the  input/output  operations  can  result 
in  excessive  demand  on  computer  resources. 

The  DR  algorithm  alleviates  the  above  difficulties  when  the  covariance  structure  of  Sm 
is  based  on  a  Gaussian  function 

e(r)  =  aexp(-rV^'^).  (2.8) 

e(/  )  represents  the  covariance  between  two  model  grid  values,  r  is  the  distance  between  the 
two  grid  points,  and  b  is  the  correlation  length  scale.  The  parameter  a  represents  the  error 
variance  of  the  model  values.  We  have  developed  a  new  algorithm,  based  on  polynomials  in 
simple  averaging  operators,  that  is  more  efficient  than  the  DR  algorithm  and  accepts  other 
covariance  structures.  The  two  algorithms  are  described  below. 
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3.  THE  LAPLACIAN  SMOOTHING  ALGORITHM 


A  description  of  the  DR  algorithm  is  provided  below  for  several  reasons.  From  a  historical 
perspective,  there  is  interest  as  to  where  we  are  coming  from,  what  are  the  salient  features  of 
the  original  algorithm,  and  what  features  are  being  modified  or  enhanced.  Because  of  a  wide 
acceptance  of  this  algorithm,  it  is  appropriate  to  provide  a  detailed  description.  This  algorithm 
provided  the  motivation  to  look  at  the  efficiency  problem  in  the  spectral  domain.  A  heuristic 
formulation  of  this  development  is  a  necessary  background  for  the  PRO  algorithm  also. 

The  DR  algorithm  is  based  on  three  stipulations:  (1)  the  observation  errors  are  statistically 
independent;  (2)  the  model  errors  are  statistically  independent  of  the  observation  errors,  and 
(3)  the  model  errors  are  assumed  to  have  a  Gaussian  covariance  structure.  The  first  two 
components  led  to  the  minimization  of  the  quadratic  functional  O  in  (2.3).  To  avoid  inversion 
of  the  large-dimensioned  matrix  the  iterative  method  of  conjugate  gradient  is  used,  whose 
iterative  steps  are  specified  in  eqs.  (2.6)  and  (2.7). 

Note  that  we  still  need  to  invert  the  observation  error  covariance  matrix;  the  problem 

of  this  inversion  is  easily  resolved  from  the  stipulation  that  the  errors  of  observations  are 
statistically  independent.  In  addition,  to  efficiently  perform  the  iterative  steps  in  eqs.  (2.6)- 
(2.7),  one  needs  to  define  the  bilinear  interpolation  matrix  D.  DR  formulated  D  in  a  simple  but 
elegant  way  so  that  its  transposition  and  multiplication  in  eq.  (2.7c)  are  easily  manipulated; 
they  handle  observations,  one  at  a  time,  so  that  the  matrix  D  does  not  have  to  be  handled  in 
its  giant  form. 

Finally,  they  implemented  a  computationally  efficient  algorithm  for  performing  the  matrix 
multiplication  Smg-  This  algorithm  involved  N  applications  of  the  operator  (1  -I-  V^/4/fiY) 
on  the  field,  (j,  where  is  the  Laplacian  operator  defined  on  the  discrete  model  domain.  The 
number  .V,  referred  to  as  the  degree  of  the  Laplacian  operator,  is  determined  in  a  heuristic 
manner.  Described  below  is  their  algorithm,  the  method  of  determining  the  degree  X ,  and  the 
way  they  handle  the  boundary  problems  inherent  in  the  application  of  the  Laplacian  operator. 

3.1  Approximating  Smg 

The  usual  protocol  is  to  estimate  the  covariance  structure,  S,,,,  from  long-term  historical 
nowcasting  experiments  using  real  data.  Flowever,  in  the  absence  of  such  a  historical  record, 
one  can  use  reasonable  approximations  to  provide  a  substitute  covariance  function.  The  use  of 
the  Gaussian  formulation  is  a  well  accepted  alternative. 


Gaussian  covariance  structures  have  been  used  quite  extensively  in  various  applications 
(Daley,  1991)  and  have  been  shown  to  be  quite  appropriate  in  regions  where  correlations  are 
positive  (Thompson,  1956;  Thiebaux  and  Redder,  1987)  With  respect  to  the  Gulf  of  Mexico, 
Gaussian  covariance  structures  have  been  used  to  derive  realistic  dynamic  climatologies  of  the 
Gulf  (Passi  and  Kantha,  1992)  The  algorithm  described  below  to  approximate  the  matrix 
multiplication,  E,„g.  is  predicated  on  the  fact  that  U„,  is  based  on  a  Gaussian  covariance 
function. 

The  Gaussian  covariance  function,  f  (/  ),  from  (2.8)  is  expressed  in  a  two-dimensional  plane 
as: 

((/■)  =  =  oexp{-  y(.r^  -f  t/^)]  (3.1) 

where  the  coefficients  n  and  T  describe  the  error  and  the  scale  functions  being  considered. 
Comparing  with  (2.8),  o  —  a  and  =  p,  where  b  is  the  correlation  scale  length. 

For  further  development,  it  will  be  useful  to  write  an  expression  for  an  element  of  the 
vector  h  =  E,ng  that  occurs  in  (2.6c)  and  (2.7g).  Note  that  the  vector  h  corresponds  to  the 
two-dimensional  model  grid.  Thus,  for  a  grid-point  (/ij.  /o).  we  have. 

f  »r»  tn 

M'o..yo)  =  (3.2) 

'=17=1 

where  A  is  the  uniform  grid  length  along  the  A'  and  Y  directions,  and  J,,,  are  the  maximum 
values  of  i.  j.  Thus,  the  array  h,  resulting  from  the  matrix  multiplication,  Smg.  represents 
smoothing  of  the  g  array  using  the  Gaussian  function,  e(.r.  y).  DR  approximate  such  smoothing 
in  the  continuous  domain  by  a  Laplacian  operator  and  carry  over  the  results  to  the  discrete 
case  using  some  empirical  adjustments.  Their  Laplacian  approximation  result  can  be  derived 
as  follows: 

ProjwKition.  Let  y{.i\y)  be  a  function  having  continuous  second  order  derivatives  d^yjO.v^ 
and  and  let  e{x.y)  be  a  Gaussian  function  as  defined  in  (3.1),  then  for  a  large  integer 

A, 

^  j  ^(0.0)  =2  J  y{.i\y)({.r.  y)d.rdy. 

A  Heuristic  Motivation:  Here,  we  provide  a  heuristic  motivation  that  guided  the  development 
of  the  DR  algorithm.  A  rigorous  and  exact  treatment  of  the  required  analysis  is  neither  necessary 
nor  the  intent  of  this  technical  report.  The  main  objective  is  to  provide  an  insight  into  and 
direction  toward  the  analysis  of  the  PRO  algorithm,  which  is  both  rigorous  and  exact. 

Since  y)  is  continuous,  it  can  be  expressed  in  the  spectral  domain  as: 
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(/(, (•.  (/)=  ^  ^  exp(-/(;/(.i'  4- /"/)]• 


(3.3) 


Ml=  —  X.  ^ 


The  smoothed  value,  at  .r  =  //  =  0  of  </(.r. //),  using  the  Gaussian  smoothing  function 

(3.1),  is  given  by. 


/  exp(- 4- V")I  ^  ^  r/,„„  exp[-/(/?/.r  4- »/(/)]</.<</!/ 

l)l=  — ^ll=  — TC 

^  ^  r  >  .  i  1 

(ITT  v| — '  ^ — »  ll~  4"  III 

=  — L  L  — 47-- 

nj  =  —  oc  ff  =  —  ^  ** 


(3.4) 


The  objective  now  is  to  approximate  Ii{Q.Q)  of  eq.  (3.4)  by  applying  a  Laplacian  smoothing. 
From  (3.4)  one  needs  to  define  an  operator  based  on  Laplacian  that  will  yield  an  amplitude 
proportional  to  exp  —  at  wave  numbers  (;»./?). 

Note  that 

^■4( '■■'/)  =(^  + Ip:) 


=  ^  ^  a,„„(-/ri^  - //^)exp[-/(/?/.f  +  u,(/)) 


/||=— X/I=s  — X 


Since,  for  a  given  ij,  as  X  — >  oo,  lim(l  4-  y/N)‘'^  —  exp(i/),  an  examination  of  (3.5)  suggests 
the  application  of  the  operator  £  =  (1  4-  -y^^)  to  j/i  so  that  at  .v  =  y  =  0,  this  application 
yields: 


£j(0,0)=  Yi  E 


{ml  +  n'l)  J  . 


(3.6) 


Successive  N  applications  of  this  operator  on  y  yield: 


‘^‘'S{o.o)=  Y  Y  "»"■  (i “ +  "4) 


(3.0 


ril  ss  —  X  f » =  —  X 


For  values  of  m  and  ;?  which  are  small  relative  to  N, 


V 

(l  -  -  €Xp[-^(?772  4-  7»^)]. 


(3.8) 
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Now,  we  assume  that  the  high  frequency  content  of  ;y(  <'■</)  's  negligible,  i  e,,  there  exists  an 
integer  .V'  <  .V,  so  that  a,,,,,  -  0  for  ni.ii  >  .V'  Thus,  from  (3,7),  we  can  approximate 


\, 


(3.9) 


£''  ^(0.0)  -  X]  X!  '  ^ 

V'  .V' 

-  'YL  'YL  "»'»  +  "^)]  from  (3,8) 

(1/  =  —  ,V'  »  =  —  ,V' 

-  YL  YL  + "')! 

/»=  —  'Xill=  —  oc 

We  compare  (3,4)  with  (3.9),  and  set  ^  ^  to  obtain: 

£^  .7(0.0)  ^  (  f/nT)/~)(0.0)  =  (■V'l^)  /  f 

J  —  -y^J  — oc 

which  demonstrates  the  proposition. 

The  results  of  the  continuous  case  demonstrated  above  are  easily  generalized  to  the  discrete 
model  domain.  Let  be  the  continuous  analog  of  the  discrete  array  (j{i.j),  i.e.,  o{i^j)  = 

y(/,,/)  .  Thus,  for  (/u<./(j)  =  (0.0),  we  can  approximate  h{ioJo)  of  (3.2)  as 

/?(/„.  jo)  ~  7/i(0.0) 
fjTT 


n 


=  i—£' 9(0.0), 

17  TV 


(3.10) 


where  is  an  appropriate  constant,  approximates  the  Laplacian  '^rom  y(r../)  defined  on  the 
discrete  grid,  C  is  the  corresponding  operator  replacing  C,  and  N  is  then  referred  to  as  the 
degree  of  the  Laplacian  operator. 

In  spite  of  all  these  approximations  made  in  the  above  derivation,  the  Laplacian  smoothing 
algorithm  works  extremely  well. 


3.2  Determination  of  *he  Degree  of  the  Laplacian  Operator  C 


/V 


The  degree  iV  of  the  operator  is  determined  heuristically.  The  criterion  is  to  choose 
that  value  of  N  which  stabilizes  the  computations  and  provides  accurate  computations,  though 


no  absolute  measures  of  the  two  criteria  are  easily  applied.  To  achieve  this,  the  following 
procedure  is  used: 

(i)  Select  a  value  .V  for  the  degree  of  . 

(ii)  Let  I  =  /„  be  the  middle  of  the  grid,  and  consider  points  (/n../) 

(iii)  Sequentially,  consider  a  two  dimensional  field  r,  such  that 


1 

0 


i^  >  =  /(I..; 
otherwise. 


.7 


where  1  <  •/  <  J,uax-  Note  that  only  one  field  is  non-zero.  For  each  combination 
<  J  <  J,„ar  smooth  the  field  with  Laplacian  smoothing  using  the  degree  .V 
selected  in  (i).  Let  i.’(/u..7)  be  the  smoothed  value  at  (/o.  /),  so  as  to  form  an  array 
.  1  <  .7  <  The  smoothing  is  performed  without  topography. 

(iv)  Compare  the  smoothed  array  try  with  the  smoothed  array  t,\\>  obtained  using  a  smaller 
degree  X' .  Stability  is  achieved  when  the  array  t’.v  varies  smoothly  over  .7,  and  accuracy 
is  achieved  when  the  two  arrays  do  not  differ  ‘significantly’.  We  stop  if  the  two  criteria  of 
smoothness  and  accuracy  are  satisfied,  otherwise  .V  is  incremented  by  a  suitable  integer 
and  the  above  steps  are  repeated. 

In  addition,  one  must  perform  extended  numerical  experimentation  to  ensure  that  the  selected 
value  of  .Y  does  not  make  the  data  assimilation  process  unstable.  To  avoid  this,  we  add  an 
increment  of  5-10%  of  the  selected  number.  For  instance,  in  our  experimentation  with  the 
Gulf  of  Mexico  data  assimilation  work,  we  found  that  numerical  stability  was  attained  around 
.V  =  300.  To  be  safe,  we  added  20  to  it  and  used  X  =  320. 


3.3  Treatment  of  the  Boundary  Values 

First,  we  consider  computation  of  the  Laplacian  at  an  internal  point  {i,j)  of  the 

field  ir.  In  this  case,  the  point  {i.j)  is  considered  as  a  center  of  five  point  star  in  the  ocean, 
and  the  Laplacian  is  approximated  as: 

~  +  1)  -  j))  -  -  </’(/, 7  -  1))1 

+  +  I.j)  -  «-•(/, j))  -  «’ir(v(T j)  -  ir{i  -  I.j))] 

=  j)(»’.v  +  «’s  +  +  ft'w) 

-f  wyU'{i,j  +  1)  +  U'su'iij  -  1)  +  +  I.j)  +  -  I.j)  (3.11) 
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where  /c's  are  location  dependent  and  distance  related  weights,  given  by 

('*  (*> 

"•  v  =  N-s  =  Vr7'  ^  "’/■  ==  ""  V  ;V/- 

where  </,  and  </,,  are  the  horizontal  grid  spacings  for  the  model.  Conceptually,  because  of  the 
latitude  difference,  it  is  possible  that  tc \  /  ws,  but  u'w  =  wj;. 

When  the  point  is  a  boundary  point,  then  the  following  approximation  is  implemented. 
Consider  a  one  dimensional  space  where  vo  =  0  is  the  value  assigned  to  the  point  next  to 
the  boundary  and  ci  is  the  value  at  the  boundary  point.  Then  the  smoothed  value  using  the 
Gaussian  function  is  given  by 

>rog  =  >i'i  exp{-/9r,^]  (3- 13) 

where  the  suffix  y  in  (.’y^  indicates  the  Gaussian  smoothing,  and  ry  is  the  distance  between  the 
two  points.  This  is  the  effect  approximated  after  N  applications  of  the  Laplacian  (corresponding 
to  the  discussion  of  Section  4.  Thus  we  denote 


~  i.'og  =  <l'i  exp[-/yr^] 


/  j  2I 

=  01  exp 

iV 


(3.14) 


For  L  Laplacian  applications,  where  £  is  a  large  integer,  we  obtain  from  (3.8): 


.V'  ^  "  \  ^  L  00  00  j. 

^  ^o,„„  f  1  -  expl-^ei(m^  +  /i'^)]  (3.15) 

yi=U/l=y  ^  ^  ni=()n=0 

The  right-hand  side  of  (3.14)  is  the  smoothing  effect  when  i3  is  replaced  by  Thus,  for  large 
integer  L, 

r  jV  ,1 

4'ql  =  01  exp  - (3,16) 

However,  the  approximation  is  assumed  for  all  L  =  1 . N.  Instead  of  different  smoothing 

at  each  Laplacian  application,  an  average  smoothing  factor 


L  00  ryo 


1  '^’1 
~  I 

l=l 


is  applied.  Thus,  at  each  Laplacian  step,  (3.16)  is  replaced  by 


(3.17) 


4'0„  =  V'l  «xp(-jVo,/jr51 


(3.18) 
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where  the  suffix  a  in  indicates  average  smoothing  application.  Dropping  the  direction 
indices  in  (3.12),  setting  =  </•"’,  and  recalling  that  d  =  we  can  write 


Substituting  in  (3.19),  we  obtain 


(3.19) 


'-'oa  =  <.’1  exp(-A'ae,J;-,-^] 

_ ^ _ 


(3.20) 


Now,  we  can  discuss  the  following  boundary  cases: 

(i)  The  point  (/.  j)  is  the  right  most  point,  i.e.,  the  point  (/•  +  l.j)  corresponds  to  land:  in 

this  case  set  w  =  u'e,  and  j/>oa  =  tf'i+i.j- 

(ii)  The  point  (/../)  is  the  left  most  point,  i.e.,  the  point  (/  —  l,j)  corresponds  to  land:  in  this 

case  set  w  =  ir\\\  =  u'ij  and 

(iii)  The  point  {i.j)  is  the  top  most  point,  i.e.,  the  point  {i.j  -1-1)  corresponds  to  land:  in  this 
case  set  w  =  iri\.  i/’i  =  and  v'oa  = 

(iv)  The  point  {i.j)  is  the  bottom  point,  i.e.,  the  point  (i.j  —  1)  corresponds  to  land:  in  this 
case  set  tv  =  ws.  C'l  =  ti^ij  and  c’on  =  U'i.j-i- 


4.  PRODUCT-POLYNOMIAL  OPERATOR  ALGORITHM 

As  we  follow  the  derivation  of  the  DR  algorithm  in  Section  3,  two  possible  improvements 
are  suggested.  The  first  is  to  replace  the  operator  11-}-^)  in  the  Proposition  by  some  other 
polynomial  in  V^,  P  ®  better  approximation  to 

(3.4).  If  a  more  accurate  approximation  is  achieved,  then  we  believe  it  would  be  possible  to  pick 
the  degree  of  p  much  lower  than  N,  thus  saving  many  computations  when  computing 
As  we  analyze  this  possibility,  we  notice  the  second  improvement.  Since  the  quantity  we  are 
trying  to  estimate  (3.2)  is  a  convolution  product  over  a  discrete  space,  it  should  be  possible 
to  give  an  analysis  that  exploits  this  fact  directly  without  using  continuous  approximations  as 
in  Section  3.  This  proves  to  be  true.  We  find  that  in  this  new  domain  analysis,  the  Laplacian 
operator  is  best  replaced  by  a  product  of  simple  averaging  operators.  This  leads  to  a  good 
low-degree  approximation  to  h{m.n)  of  (3.2),  and  a  simpler  and  more  direct  analysis.  This, 
in  turn,  makes  it  much  easier  to  specify  the  degree  of  p  and  handle  the  boundary  conditions. 
For  completeness  we  give  a  mathematical  development  of  this  analysis. 
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4.1  Preliminaries  and  Notation 


In  the  subsequent  development,  we  will  express  the  matrix  multiplication,  E„ig,  as  a 
convolution,  and  exploit  the  fact  that  the  Fourier  transform  of  the  convolution  is  a  product 
of  the  transforms.  Convolution  between  two,  two-dimensional  arrays,  and  ;/.'((»/. n)  is 

defined  as: 

([/I  *  •J2){fn.  ”)  =  X]  X!  »  -  0-  (^1) 

From  Fourier  theory,  we  know  that 

,jlTy.2  =  Ol'Jl-  (^-2) 


Further,  we  shall  define  a  function  of  several  variables,  /(.ci . r„)  as  .'tt'/pa.ru.hip.  if  it 

can  be  expressed  as  a  product  of  factors  that  are  even  functions  of  a  single  variable,  i.e., 
fi  n . '•„)  =  n /<(  '■<)  such  that  /,{-■<)  =  f,i<). 

In  our  application 


e{s.t)  =  exp((-.-<^^f  -  (4.3) 

Accordingly,  this  covariance  function  (4.3)  is  separable,  since  =  (ifi{^)fzif)  where 

/,((/)  =  exp(— We  will  require  that  the  covariance  functions  be  separable.  A  conse¬ 
quence  of  the  functional  separability  of  a  covariance  function  is  that  its  Fourier  transform  is 
also  separable  (due  to  factorability)  and  real  (because  of  evenness). 

4.2  Derivation  of  Canonical  Forms 

The  development  of  the  Product-Polynomial  Operator  is  aided  by  deriving  two  canonical 
forms  that  are  functionally  similar.  One  canonical  form  expresses  the  two  dimensional  convolu¬ 
tion,  hiiv.ti),  in  the  Fourier  domain,  while  the  other  represents  the  resultant  of  when  y(m./j) 
is  operated  on  by  the  PPO.  A  comparison  of  the  two  expressions  leads  to  the  selection  of  the 
polynomials  to  specify  the  PPO. 

We  will  consider  convolution  approximation  in  two  dimensions  with  an  'elliptical’  covari¬ 
ance  function  t(.s.f)  given  by  (4.3);  the  Gaussian  covariance  function  is  a  special  case  of  (4.3) 
with  <')[  =  d2  =  d.  The  ‘elliptical’  covariance  formulation  provides  the  flexibility  of  different 
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model  grid  lengths  along  the  two  coordinates.  The  elements  of  the  h  vector  corresponding  to 
(3.2)  are  given  by 


//(//;.;))  =  (y(.s.  t)(i  exp 


i.t 


df{ni  -  -  if 


(4.4) 


We  want  to  show  that  /i((n.  ii)  in  (4.4)  can  be  approximated  by  successively  applying,  yet  to  be 
determined  polynomial  operators.  P\{D\ )  and  Pi{Dz)  where  Dj.j  =  1.2  are  simple  averaging 
operators,  given  by 


It)  =  [(j{n)  +  1.  n)  +  y(>}?  —  1.  (4.5(/) 

D2<j{iu.  ii)  =  (</(m.  n  +  1)  +  <j{iiK  n  -  l)]/2.  (4.5/r) 


The  implementation  is  affected  by  a  comparison  of  the  two  canonical  expressions  given  in  the 
following  theorems: 

Thiuirr.iii.  1.  Let  ,  li')  be.  the  two  dimenfional  Fourier  transform  of  y{ni .  ii).  Then 

/j(m.n)=  f  (4.6) 

Jo  Jo 

where 

qk{0)  =  exp(— .8^^f./6^)exp(— 27rffi^).  k  =  1.2.  (4.7) 


Theorem  2.  Let  Dj,j  =  1.2  be  the  averaging  operators  frojn  {4-^J-  Then  for  any  polyno¬ 
mials  Pi  and  Py. 

P2{D2)Pi{Di)g{nun)  =  f  g{e,  d')P2{cos2Kr)Pi{cos2K9)e‘^^^^'''^-^"'-'^  (4.8) 

Jo  Jo 


Note  that  the  right-hand  sides  of  (4.6)  and  (4.8)  are  similar  functional  forms.  An  exami¬ 
nation  of  the  two  equations  indicates  that  we  could  approximate  />(m,  n)  as 

li{nKn)  ~  P2{D2)Pi{Di)g{nKn)  (4.9) 

provided  we  choose  Pi  and  P-z  such  that  Pj(cos27r^)  ~  qi{9),  and  P2{cos2iri')  ~  qii’i').  How 
good  an  approximation  (4.7)  is  depends  on  how  well  we  can  approximate  ry,  by  polynomials. 
Pi,  over  the  domain  of  ry/. 
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We  now  prove  the  two  results  in  (4.6)  and  (4.8).  First,  we  note  that  h{in.  n)  is  a  convolu¬ 
tion  of  with  the  two-dimensional  array  defined  in  (4.3).  Following  the  arguments 

of  (3.2), 


ii)  =  sJ)aexp 

s.t 

=  J^Y.<j{s.f)aexp 


Sj 


(f,»  -  >  0'^  +  iUn  - 
1)^ 

df  ( nt  -  -f  dj{n  -  t)^ 


=  ^^j/(.s.t)f(m  -  .s,n  -/) 
=  (j*e. 


(4.10) 


The  proof  of  Theorem  1  follows  as  an  immediate  consequence  of  the  convolution  (4.10). 
Proof  of  Theorem  1.  We  start  by  computing  the  Fourier  transform,  c: 

((S,  ^/-)  =  ^  ^  exp((-.s^<!>f  -  exp(-2;r/5i^  -  Iwiti/')  (^11) 

where  (f  and  V’  are  in  the  interval  [0, 1].  Because  e(.s,f)  is  separable,  we  can  write 

=  ^]^exp(-.!!^<!)i)/6^exp(-27r?.:»ff))  e>^P('-27r/7j/')) 

=  ^i(^)«/2(V’)  (^-12) 

where  (/*.,  A-  =  1. 2  are  defined  in  (4.7).  Next,  since  from  (4.10),  /?(»?.  n)  is  a  convolution  of  y 
and  f ,  we  can  write  h  in  the  transform  space  as  a  product  of  their  transforms: 

h=  ye 

=  (/qi<l2-  (^13) 


Thus,  by  the  Fourier  Inversion  Theorem 

h{nun)=  f 

J\)  J\) 

Finally,  Theorem  1  is  proved  when  h  in  the  above  equation  is  replaced  by  the  right-hand  side 
of  (4.13). 

Proof  of  Theorem  2.  With  y  as  the  Fourier  transform  of  </,  we  can  write 

y{ni,n)=  f  f  (4.14) 

Jo  Jo 
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Applying  operator  D\  from  (4.5a)  to  (4.14)  we  get 


r\  pi  /  21TIO  I 

Di  (/(/;;.//)  =  /  /  ^ 

^1)  A)  ^ 

=  [  f  </(^.  C’)cos(27r#^) 

./u  ./o 


2jr/(  . 


We  note  that  every  time  D\  operates  on  g,  we  get  an  extra  factor  of  cos(27rfy)  inside  the 

integral.  Thus,  computing  Di.  Df . it  is  easy  to  see  that  the  application  of  a  polynomial 

operator,  Pi(Di),  yields: 

ri{Di)y{ni.n)=  f  [  5(0.  c-)A(cos 2;r^)e^’^'<'"‘'+'"'’>f/0f/i.'. 

./o  Jo 

A  similar  application  of  operator  polynomial  Pi{Di)  yields: 


P2{D2)g{in.ii)  =  j  f  5(0.  v’)-f2(cos27ri/')e 

Jt)  Jo 


'iwi{  m#+in/>) 


Thus,  a  successive  application  of  two  polynomial  operators,  Pi  and  Pz,  gives  the  desired 
expression  (4.8),  proving  Theorem  2. 

If  we  want  to  approximate  by  P2(i?2)A(A)i)(</)(m,  n),  then  the  error  will  be 

bounded  by 


|(£:-P2(D?)/’i(£'i))4r("i.ii)l  <  /'  f  \H«-  v)\rWdf 

Jo  Jo 


(4.15) 


sup  1^1(0)92(1/’)  -  Pi(cos2;r0)F2(cos27ri/  )| 


where  9,  are  given  by  (4.6).  Thus,  we  see  that  our  goal  should  be  to  estimate  9i(0)  by 
P,(cos  27r0),  i  =  1, 2.  This  is  discussed  in  Section  4.4  below. 


4.3  Generalization  to  Separable  Covariance  Functions 

This  generalization  is  easily  achieved  by  the  following  lemma  and  a  restatement  of  Theorem 
1.  The  results  are  stated  in  two  dimensions,  extension  to  higher  dimensions  is  obvious. 

Lemma  1:  Let  e(x,{/)  =  ei(.r)e2(.y)  0.  separable  function,  i.e..  Ci(i')  =  €i{—s).  then  its 

Fourier  tran.^form.  is  also  separable  such  that 

e(0,0)  =  ei(0)e2(y>)» 
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and  r',((i)  =  c,(— (i).  0  <  (\  <  27r. 

Thcon  rti.  .V.  (.’)  be.  the  two  dimensional  Fourier  transform  of  y{ni .  d).  Then 

IS  ijieeii.  by  (4-^):  Ihis  ease 

Va(^)  =  ^  f<  (.^)exp{-2T/.^<^),  k  -1.2  (4.16) 

Lemma  1  is  essentially  proved  in  the  beginning  of  the  proof  of  Theorem  1  when  exp((  — s  — 
t^hi)/b^\  in  (4.11)  is  replaced  by  ({s.  t).  The  separability  of  the  Fourier  transform  is  demon¬ 
strated  in  (4.12). 

We  again  note  that,  although  the  applicability  of  the  PRO  algorithm  to  the  wider  class 
of  separable  covariance  functions  is  quite  elegant,  its  practical  utility  is  yet  to  be  explored. 
Presently,  often  employed  covariance  structures,  other  than  Gaussian,  do  not  appear  to  be 
amenable  to  such  simple  representation. 

4.4  Polynomial-Operator  Approximations 

Using  the  evenness  property  of  efc(.?)  in  (4.16),  we  have 

^  ei{s)2cos{2rrs0)  -f  e,(0). 

3>1 

In  our  application,  e,(.'i)  =  exp{—s^h‘f/b^),  which  is  an  even  function  in  Now,  consider 
picking  a  polynomial  Pi  so  that  Pi(cos27r^)  is  a  good  approximation  to  qi{9).  To  more  easily 
compute  this  polynomial,  we  make  the  change  of  variables  /  =  cos27r^,  such  that  f  ranges  over 
(-1,1).  Thus,  we  need  to  compute  Th  =  cos(2tH)  in  terms  off.  This  is  easily  achieved  using 
the  Chebychev  polynomial  recursion  formula  (Conte  and  deBoor,  pg.  239,  1980).  This  result 
implies  there  exists  a  kth  degree  polynomial  Tk  such  that  Tk{t)  =  Tk{cos2K$)  =  cos(27rA;0), 
and 

r,+i  =2fr,(0-r*_,(f),  k  =  i.2.... 

where  Tti{t)  =  1,  and  Ti{t)  =  t.  This  is  easily  seen  by  induction  using  trigonometric  identities; 

TkJ^ift)  =  cos(27r(/,-  -f  1)6)  =  cos27r^cos2A-7r^  —  sin  27r0  sin  2A'7r<? 

=  2cos2n6 cos2kTrB  —  {zos2n6 cos2Trk6  -1- sin 2T^sin 2A’n-^) 

=  2cos27r^cos2A;7r^  —  cos{2ir{k  —  1)^) 

=  2tTk{t)  -  Tk-i{t) 
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We  can  compute  for  any  value  of  /  =  cos(27r^)  using  the  recursion  to  compute  cos(2j>i^y) 
in  terms  of  /  and  adding  up  enough  terms  so  that  the  remainder  of  the  terms  will  be  small. 
This  is  possible  since 

Now  we  want  to  construct  Pi  as  an  approximation  to  rji(O)  considered  as  a  function  of  f. 
The  idea  here  is  to  compute  P|  as  an  interpolating  polynomial  on(-l,l]  in  t.  The  question  is  at 
which  points  do  we  interpolate?  We  use  the  expanded  Chebychev  points  (Conte  and  deBoor, 
pg.  244,  1980).  These  points  are: 


•I’ I 


o  -h  h  +  (a 


/  =  0 . n 


where  a  =  —1.  h  =  1.  This  choice  of  points  gives  nearly  an  optimal  polynomial  Pi  such  that 
max_i<i<i  |Pi(/)  —  'yi(0l  nearly  minimized  (Conte  and  deBoor,  pg.  243,  1980).  At  worst, 
the  error  is  only  about  four  times  as  great  using  these  points  as  it  would  be  if  we  actually 
found  the  optimal  polynomial.  There  are  algorithms  for  computing  these  optimal  polynomials. 
See  the  second  Algorithm  of  Remes,  and  the  Differential  Correction  Algorithm  (Ralston  and 
Rabinowitz,  pgs.  315-320,  1978).  We  use  the  expanded  Chebychev  points  instead  because  of 
their  simplicity  and  because  we  have  not  decided  what  degree  polynomial  to  use.  It  will  be  a 
lesser  computational  effort  to  determine  the  degree  using  the  expanded  Chebychev  points  than  if 
we  were  to  use  these  more  complicated  algorithms  to  determine  the  actual  best  approximation. 

Thus,  we  let  Pi{t)  be  the  interpolating  polynomial  to  qi{t)  =  at  the  expanded 

Chebychev  points.  The  function  qi{(f)  =  qi{f)  is  computed  to  e-tolerance  by  using  the  Cheby¬ 
chev  recursion  formula,  and 


30 

qi{e)  «  qi{ff)  =  ei(0)  +  ^ei(.?)2 cos(2n-.s^),  (417) 

,1=1 

where  we  pick  .riy  such  that  |ei(.^)|  <  Toll,  where  Toll  will  be  determined  below. 

4.5  Determination  of  the  Degree  of  Pi 

To  determine  the  degree  of  Pi  requires  some  computational  effort.  We  first  determine 
an  error  tolerance  Toly  (given  below).  We  pick  n  and  find  the  polynomial  of  degree  //  that 

interpolates  qi{(l)  at  the  expanded  Chebychev  points,  .i  j,  /  =  0 . n.  We  approximate  jjrji  — 

Pill  by  maxi<Kiou  |k/i  -  Pi||  where  #,  are  one  hundred  equally-spaced  points  on  [-1,1].  If 
ll<7i  <  T0I2,  we  stop.  Otherwise,  we  increase  the  degree.  We  stop  if  the  tolerance  is 

satisfied  or  if  the  degree  exceeds  some  predetermined  degree.  In  our  case,  we  chose  this  degree 
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to  about  10.  If  the  degree  becomes  too  large,  the  computations  become  numerically  unstable. 
One  should  also  stop  if  |j<Ji  —  Pi|l  starts  to  increase.  If  the  error  goal,  ||(yi  —  Pil|  S  To/.-, 
is  not  satisfied,  then  an  error  message  is  given,  and  the  degree  that  minimized  ||(Ji  —  P(|l  is 
used.  Assuming  the  error  goal  is  achieved,  we  have  ll<ii  -  -Pill  <  \\<i\  ~  'iill  +  !|yi  -  Pill  < 
T()l\  +  Tol>  =  Tolf.  A  similar  analysis  holds  for  «/>  and  P2  with  (|(/2  —  PjII  <  Pt>/,, 

4.6  Determination  of  Error  Tolerances 

To  determine  the  error  tolerances  above,  we  need  to  keep  in  mind  the  goal  of  making 

\h{in.  n)  —  P2{D >)P\{D \  ))f/('». »0I  ^  '0  €  2,~ 

where  f  is  given.  Note  f  should  not  be  chosen  much  smaller  than  the  expected  error  in  one’s 
measurements.  To  determine  Tol\.  Toly,  we  need  an  error  estimate. 

|/l(m./0-P2(P2)Pl(P|)M/H./0|  <  /'  -PlP^li 

Jo  Jo 

<  {MyToh  +  M^Toh)! 

where  J  =  |^((^.  i.')|rW(.’,  and  Mi  =  ||r/,||  (sup-norm).  We  have  also  approximated 

IIP2II  ~  We  can  estimate  ||ry,||  <  |f,(0)|  -f-  X).!i>i  2|f/(!’)|.  '  =  1.2.  Earlier,  we  were  able 
to  show  that  ||(y||i  <  ||</||i.  Now,  if  ||(/||i  is  large,  the  above  error  estimate  is  too  conservative 
to  be  of  any  use.  Instead  of  an  arbitrary  g,  we  choose  the  special  case  of  </  =  6,./,  which  is  one 
at  and  zero  at  all  other  points.  Then  ||</||i  =  1,  and  we  need  to  make  Tol:i  < 
and  pick  Tol{  =  Tol-y  =  This  is  justified  since 

g{io.n)  =  i.e.,  g  = 

it  si 

and  h(m.ii)  ~  Y^t  So,  if  we  approximate  E{6gj),  we  have  a  good  approxi¬ 

mation  to  /?).  We  have  found  that  the  above  strategy  works  well.  Other  error  estimates 
are  too  pessimistic  and,  hence,  require  much  more  computational  effort. 

4.7  What  to  Do  at  the  Boundary 

In  practice,  data  are  missing  off  some  finite  set.  We  set  this  missing  data  to  zero.  The 
algorithm  is  done  in  one  dimension  at  a  time.  Each  row  can  be  done  independently.  This 
means  that  many  of  these  computations  can  be  done  in  parallel.  The  nonzero  values  in  the 
data  propagate  into  the  missing  data,  one  unit  on  each  end  of  a  fixed  data  line,  each  time 
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we  apply  D,.  Thus,  we  need  to  add  tenfiporary  memory  of  size  lit,,  where  //,  =  degree  of 
polynomial  P,.  /  =  1.2.  We  only  need  to  start  the  computations  at  a  point  once  there  is  a 
nonzero  value  on  either  side  of  the  point. 

The  calculations  for  the  PRO  algorithm  are  local  in  the  sense  that  each  time  we  apply  the 
averaging  operator,  we  just  use  information  directly  to  the  left  and  right  of  the  point  where 
the  computation  is  being  performed.  So  for  each  power  of  the  operator,  we  reach  one  unit 
further  to  the  left  and  right.  So  it  is  only  after  n,i  steps  that  we  notice  the  effect  of  data  //,/ 
units  away.  The  convolution  operator  itself  averages  over  all  the  data  and  so  it  is  global  in 
extent.  However,  if  we  did  a  truncated  sum  for  the  convolution,  i.e.,  stop  adding  after  the 
Gaussian  terms  are  small,  then  this  operator  would  also  be  local  in  nature.  The  DR  algorithm 
is  also  local  in  the  same  way,  but  it  uses  information  in  the  five  point  stencil  about  the  point 
(See  Section  3.3),  each  time  spreading  outward  one  unit  simultaneously  in  A'  and  Y  directions. 
After  A’  applications  of  H,  the  DR  has  absorbed  information  from  and  .V  x  ,V  square  around 
the  point. 

The  issue  at  the  boundary  is  that  both  algorithms  assume  the  missing  data  is  zero.  One 
could  extrapolate  or  interpolate  or  extend  the  data  in  a  periodic  fashion,  but  in  all  cases  we 
are  filling  in  missing  data  (with  PPO  the  data  is  set  to  zero).  Then  we  apply  the  algorithm. 
However,  we  don't  compute  the  operator  very  far  outside  our  data  set  since  we  don’t  care 
about  these  values  except  to  make  our  algorithm  function.  So,  we  only  compute  our  nj  units 
off  each  row.  Calculations  at  the  boundary  are  suspect  for  both  algorithms,  the  only  difference 
being  that  with  the  PPO  algorithm  we  treat  the  missing  data  as  zero  and  apply  the  algorithm, 
whereas  the  DR  algorithm  tries  to  approximate  the  effect  of  applying  operator  in  (3.20).  This 
is  because  the  degree  of  the  Laplacian  operator  is  so  high  that  computing  off  the  data  set 
would  introduce  serious  errors.  In  the  PPO  algorithm  the  degree  is  low,  so  it  is  no  problem. 

4.8  Computational  Effort 

When  we  interpolate  qi  at  the  expanded  Chebychev  points  ,i  o . r„,  we  obtain  a  poly¬ 

nomial  Pi  in  Newton  form  (Conte  and  deBoor,  pgs.  40-44,  1980), 

Pi{x)  =  oo  -I-  0)(.r  -  .vo)  +  02(.r  -  .ro)(;r  -  ,i’i )  -|- . . .  -|-  a„(.r  -  .Vo)  . . .  (.r  -  .r„_i ). 
Replacing  .r  by  D,,  we  need  to  compute 


Pi{Di)  =  (to  +  «i(A  -  .to)  -I-  «2(-Di  -  ro){Di  -  .1*1 )  -f- ...  -I-  (in{Di  -  xo). . .  (I>,  -  .i  „_i ). 
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We  evaluate  this  in  Newton  nested  form,  just  as  we  do  for  a  polynomial.  Thus,  a  typical  step 
in  this  evaluation  is  /.  i  /  +  -  /.  (/)  where  /  is  the  identity  operator.  So,  one  application 

of  such  an  operator  at  a  point  in  the  grid  requires  three  multiplications  and  two  additions 
To  evaluate  D  requires  one  more  addition  and  a  division  by  2  (a  shift  register,  so  we  will  not 
count  this  operation).  Thus,  one  step  in  the  nested  evaluation  is  6  arithmetic  operations  - 
3  multiplications  and  3  additions.  Then  we  do  this  at  each  point  of  the  grid.  We  do  this 
operation  //,  times  (degree  of  P,).  Let  .V  be  the  total  number  of  points  on  the  grid.  We  can 
either  add  2\v,  points  to  the  data  points  or  ignore  the  calculations  at  the  boundary,  if  the  degree 
is  small  compared  to  the  length  of  the  line.  So,  we  have  bn,X  operations  so  far.  After  we 
have  computed  Pi{Di),  we  compute  P2{D>).  So.  we  have  a  total  of  6A'(/ri  +  n>)  arithmetic 
operations  (0(N)).  Thus,  if  the  degrees  of  and  ;?>  are  fairly  small,  we  have  an  efficient 
algorithm.  If  and  H  )  are  large,  then  there  are  too  many  computations,  and  the  numerical 
stability  is  a  problem.  The  key  question  is  how  large  are  these  degrees?  It  is,  thus,  important 
not  to  make  unreasonable  error  requirements. 

Additional  ReinarLr  This  technique  fits  into  the  general  philosophy  found  in  Parks  and  Burns 
(1987),  and  references  therein.  One  estimates,  in  an  optimal  way,  the  response  function  y, 
by  selecting  coefficients  of  a  filter  to  best  estimate  q,  in  the  infinity  (supnorm)  norm.  Our 
algorithm  exploits  this  idea  by  using  a  polynomial  in  the  simple  averaging  operators  (filters). 

5.  EFFICIENCY  COMPARISON  OF  THE  THREE  ALGORITHMS 

In  order  to  give  some  measure  of  accuracy  we  implemented  two  pieces  of  code:  the  first 
calculates  h{iu.u)  directly  from  equation  (3.2)  and  the  second  approximates  h{iu.n)  with  the 
PPO  algorithm. 

To  be  more  explicit,  the  first  routine,  which  we  shall  call  Matrix-Mult,  calculated 

h(iv,  n)  «  ^  ^  g{s.  t)  exp  [{-{m  -  -  (n  -  (5.1) 

5  f 

where  the  sums  are  only  taken  over  "close”  pairs  (s.t),  -Jose  enough  that  the  exponential  term 
is  still  significant.  The  numbers  h,  in  (5.1)  above  may  be  interpreted  in  two  different  manners: 
with  equal  grid  lengths  in  the  two  orthogonal  directions,  they  represent  different  correlation 
length  scales,  />/(?),;  or,  with  the  same  correlation  length  scale,  h,  the  two  numbers  represent 
different  grid  lengths,  d,.  Obviously,  the  combination  of  the  two  situations  is  controlled  by 
specifying  di/fi. 

In  the  first  test  case,  the  function  .r/(s./)  was  a  delta  function,  i.e.,  it  is  zero  everywhere 
except  at  the  center  of  the  grid.  The  grid  boundary  was  rectangular  (Figs.  1-3).  In  the 
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Figure  1.  Test  Case  -  Gaussian  smoothing  of  delta  function  defined  on  a  rectan¬ 
gular  domain:  Matrix-Mult  Algorithm.  Results  are  scaled  to  0.01. 
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Figure  2.  Test  Case  -  Gaussian  smoothing  of  delta  function  defined  on  a  rectan¬ 
gular  domain;  Product  Polynomial  Algorithm.  Results  are  scaled  to  0.01. 
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second  test  case,  we  specified  0  —  1  g*’'^  points,  within  irregular  boundaries 

depicting  the  Gulf  of  Mexico  topography  (Figs.  4-6);  this  grid  was  employed  in  data  assimilation 
simulations  using  a  numerical  ocean  model,  which  is  the  third  test  case.  Thus,  the  first  two 
cases  provide  the  performance  of  the  three  algorithms  in  computing  the  matrix  multiplication, 
while  the  third  test  case  provides  the  comparison  in  actual  data  assimilation  situation.  For  all 
simulations,  the  scale  factor  /»  and  the  grid  dimensions  d,  in  the  simulations  were  taken  so  that 
h  =  50  and  ^i/h  =  0.4052  and  =  0.4448. 

While  implementing  the  PRO  algorithm,  the  number  of  terms,  .xq,  retained  in  (Ji  and  72 
in  (4.17)  are  calculated  as  functions  of  the  grid  spacing,  a  preset  tolerance  {Tol  =  0.001)  and 
the  scale  parameter  h  as  specified  for  Matrix-Mult.  Explicitly,  we  set  the  .sy  to  be 

.s,,  =  trnnc[  \J -  \n{Tol)/{av(j{tli-,)/h)\  -1-  1  (5-2) 

where  <lv,  is  the  grid  spacing  in  the  direction  x,  and  (tr(j{(lx,)  is  the  average  grid  spacing  in 
that  direction  and  ‘trunc’  is  the  truncation  operator.  For  both  q,,  (5.2)  gave  .■«o  =  7.  This  led 
to  the  Chebychev  polynomials  P,  of  degree  8,  that  uniformly  approximated  the  (y,’s  with  error 
approximately  0.000186. 

The  coefficients  of  the  polynomials  Pi  and  the  interpolation  points  remain  fixed  throughout 
any  single  run.  To  save  the  time  recomputing  we  used  a  FORTRAN  SAVE  statement  to  save 
the  interpolation  points  and  coefficients,  so  they  need  only  be  calculated  once.  We  set  up 
temporary  arrays  to  hold  the  enlarged  grids  to  implement  the  boundary  scheme.  Off  the  data 
set  the  arrays  are  set  to  zero,  and  we  use  the  off-grid  points  only  as  they  become  nonzero. 

Then  we  compute  the  application  of  each  operator  F,(Z?,)  on  the  grid  in  order.  That 
is,  we  first  calculate  Pi{Di){g),  then  P2{D2)[Pi{Di){(j)\.  We  use  the  Newton  form  of  the 
polynomial  with  the  precomputed  and  SAVE-ed  interpolation  points  and  coefficients. 

All  code  was  written  in  portable  FORTRAN  and  tested  on  the  CRAY  YMP.  The  simulation 
results  were  scaled  by  0.01,  and  are  given  in  Figs.  1-3.  Figure  3  provides  the  results  from  the 
original  DR  algorithm  (with  u  =  320  repeated  applications  of  the  Laplacian)  to  compare  with 
Matrix-Mult  and  the  PPO  algorithms  in  Figs  1  and  2  respectively.  No  significant  performance 
difference  is  discernible  in  this  case. 

Perhaps  more  illuminating  is  the  comparison  provided  by  the  second  set  of  simulations. 
In  this  case,  approximation  due  to  the  PP  algorithm  shows  a  better  fidelity  to  the  actual 
computation  than  the  DR  algorithm;  we  notice  some  performance  deterioration  of  the  DR 
scheme  along  the  boundary.  See  Figs.  4,  5,  and  6.  The  simulation  results  were  scaled  by  0.1. 
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Figure  4.  Test  Case  -  Gaussian  smoothing  of  constant  function  defined  on  an 
irregular  domain  provided  by  the  Gulf  of  Mexico  topography;  Matrix-Mult  Algo¬ 
rithm.  Results  are  scaled  to  0.1. 


26 


Figure  5.  Test  Case  -  Gaussian  smoothing  of  constant  function  defined  on  an 
irregular  domain  provided  by  the  Gulf  of  Mexico  topography:  Product  Polynomial 
Algorithm.  Results  are  scaled  to  0.1. 


Figure  6.  Test  Case  -  Gaussian  smoothing  of  constant  function  defined  on  an 
irregular  domain  provided  by  the  Gulf  of  Mexico  topography:  Derber-Rosati  Al¬ 
gorithm.  Results  are  scaled  to  0.1. 
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In  addition  to  giving  very  accurate  results,  the  PPO  algorithm  is  very  fast.  On  the  CRAY 
YMP,  for  a  single  matrix  multiplication  with  “real"  data,  the  PPO  algorithm  takes  approxi¬ 
mately  0.09  seconds,  the  DR  algorithm  3.1  seconds,  and  the  MATRIX-Mult  takes  15.4  seconds. 

In  a  full  data  assimilation  scenario,  the  computations  were  performed  on  CRAY  YMP  only. 
A  primitive  equation,  numerical  model  of  the  Gulf  of  Mexico,  integrated  for  forty  days  without 
any  assimilation,  followed  by  a  ten-day  integration  with  assimilation  of  the  vertical  temperature 
profile  data.  The  DR  scheme  used  .V  =  320  repeated  applications  of  Laplacians,  as  shown  in 
eq.  (3.10).  The  PPO  algorithm  used  a  sixth  degree  polynomial  operators,  P{D \  and  P{D>). 
In  addition  to  assimilation  by  the  two  algorithms,  the  data  were  assimilated  using  actual  matrix 
multiplication  in  (lOg),  so  that  the  results  of  the  two  algorithms  could  be  compared  against  a 
reference. 

The  results  are  shown  Figs.  7a,  b,  and  c  for  the  computed  sea-surface  height.  The  three 
plots  show  remarkable  similarity.  The  sea-surface  height  map  derived  from  the  PPO  algorithm 
(Fig.  7b)  matches  exactly  with  the  map  obtained  from  the  direct  matrix  multiplication  (Fig. 
7c),  indicating  differences  of  the  order  of  10~^  m.  The  sea  surface  height  map  from  the  DR 
algorithm  (Fig.  7a)  differs  in  the  third  decimal  place.  However,  we  see  quite  a  significant 
difference  from  the  sea  surface  temperature  maps  (Figs.  8a,b,c).  The  PPO  algorithm  (Fig. 
8b)  once  again  matches  exactly,  up  to  the  accuracy  of  the  map,  with  the  reference  case  (Fig. 
8c).  But  a  noticeable  deviation  between  the  maps  from  the  DR  algorithm  (Fig.  8a)  and  the 
reference  is  evident.  At  around  (88W,  22N),  the  DR  algorithm  produces  a  high  of  31.3'’C  as 
compared  to  a  high  of  30.3°C  in  the  reference  map. 

For  a  comparison  on  computation  times  in  the  full  data  assimilation  experiments,  the  DR 
scheme  took  6.87  s  per  day  of  assimilation  using  a  single  CRAY  YMP  processor  versus  .68  s 
for  the  PPO  algorithm.  The  matrix  multiplication  algorithm  took  an  inordinately  large  time  of 
25  CPU  s. 

6.  CONCLUDING  REMARKS 

We  have  formulated  the  problem  of  data  assimilation  in  terms  of  a  statistical  linear  model, 
where  the  true  state  of  the  ocean/atmosphere  is  estimated  by  minimizing  a  quadratic  functional. 
Matrix  multiplication,  Emg,  is  identified  to  be  the  major  computational  bottleneck,  where  E  is 
a  covariance  matrix  based  on  some  covariance  formulation  and  g  is  a  multi-dimensional  vector 
defined  on  a  specified  uniform  rectangular  grid. 

With  this  background,  we  provided  the  details  of  the  DR  algorithm.  This  algorithm  has 
found  a  wide  acceptance  in  the  user  community  as  it  incorporates  several  useful  features  and 


29 


ELEVATION  TS  TW 


DAY  51 


Figure  7a.  Test  Case  -  Data  Assimilation  simulations  using  vertical  temperature 
profiles:  Simulated  Sea  Surface  Height  Using  Derber-Rosati  algorithm. 
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Figure  7b.  Test  Case  -  Data  Assimilation  simulations  using  vertical  ternperature 
profiles;  Simulated  Sea  Surface  Height  Using  Product-Polynomial  algorithm. 
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Figure  7c.  Test  Case  -  Data  Assimilation  simulations  using  vertical  temperature 
profiles;  Simulated  Sea  Surface  Height  Using  Matrix-Mult  algorithm. 
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Figure  8a.  Test  Case  -  Data  Assimilation  simulations  using  vertical  temperature 
profiles:  Simulated  Sea  Surface  Temperature  Using  Derber-Rosati  algorithm. 


Figure  8b.  Test  Case  -  Data  Assimilation  simulations  using  vertical  temper¬ 
ature  profiles:  Simulated  Sea  Surface  Temperature  Using  Product-Polynomial 
algorithm. 
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Figure  8c.  Tes:  Case  -  Data  Assimilation  simulations  using  vertical  temperature 
profiles:  Simulated  Sea  Surface  Temperature  Using  Matrix-Mult  algorithm. 


provides  a  significant  computational  efficiency  over  the  actual  matrix  multiplication,  using  a 
Laplacian  smoother  provided  the  covariance  structure  was  Gaussian.  However,  the  algorithm 
needed  to  be  made  more  efficient  for  it  to  be  useful  in  practical  applications. 

Retaining  the  major  features  of  the  DR  algorithm,  we  have  derived  a  new  algorithm  that 
introduces  several  generalizing  contributions: 

•  Identify  that  S,„g  is  a  convolution  and,  thus,  express  it  in  the  spectral  domain  (4.6). 

•  Identify  simple  averaging  operators  (4.5),  and  show  that,  with  a  separable  covariance  struc¬ 
ture,  a  PRO  in  the  averaging  operators  applied  to  g  approximates  S,„g  more  accurately 
and  efficiently. 

The  overall  effect  is  that  we  have  come  up  with  an  elegant  algorithm  that  is  fast  and  efficient 
and  computes  convolutions  supporting  a  much  wider  class  of  covariance  structures  encountered 
in  the  physical  sciences. 

Although  the  derivation  is  shown  in  two  dimensions,  the  extension  to  higher  dimensions  is 
obvious.  Our  Fourier  transform-pair  in  one  dimension  is  the  integers  and  the  circle  group.  In 
higher  dimensions,  we  take  the  corresponding  product  of  these  groups  as  our  Fourier  pair. 

The  algorithm  is  efficient  if  one  of  the  functions  remains  fixed  and  the  convolution  is 
computed  many  times.  This  is  because  we  must  construct  a  polynomial  fit  to  a  Fourier 
transform  of  the  fixed  factor.  If  we  are  able  to  find  a  fit  within  our  error  tolerance  with  a  low 
degree  polynomial,  then  we  are  able  to  efficiently  and  accurately  estimate  the  convolution  with 
our  polynomial  fit  applied  to  a  simple  averaging  operator. 
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